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Abstract 

We review some recent learning approaches in variational imaging, based on bilevel optimisation, and 
emphasize the importance of their treatment in function space. The paper covers both analytical and 
numerical techniques. Analytically, we include results on the existence and structure of minimisers, as well 
as optimality conditions for their characterisation. Based on this information, Newton type methods are 
studied for the solution of the problems at hand, combining them with sampling techniques in case of large 
databases. The computational verification of the developed techniques is extensively documented, covering 
instances with different type of regularisers, several noise models, spatially dependent weights and large 
image databases. 
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1 Overview of learning in variational imaging 

A myriad of different imaging models and reconstruction methods exist in the literature and their analysis 
and application is mostly being developed in parallel in different disciplines. The task of image reconstruction 
from noisy and under-sampled measurements, for instance, has been attempted in engineering and statistics (in 
particular signal processing) using filters [60l |73l [28] and multi scale analysis [79l|50l|80], in statistical inverse 
problems using Bayesian inversion and machine learning m and in mathematical analysis using variational 
calculus, PDEs and numerical optimisation HI]. Each one of these methodologies has its advantages and 
disadvantages, as well as multiple different levels of interpretation and formalism. In this paper we focus on 
the formalism of variational reconstruction approaches. 

A variational image reconstruction model can be formalised as follows. Given data / which is related to an 
image (or to certain image information, e.g. a segmented or edge detected image) u through a generic forward 
operator (or function) K the task is to retrieve u from /. In most realistic situations this retrieval is complicated 
by the ill-posedness of K as well as random noise in /. A widely accepted method that approximates this ill- 
posed problem by a well-posed one and counteracts the noise is the method of Tikhonov regularisation. That 
is, an approximation to the true image is computed as a minimiser of 

aR{u)^d{K{u)J), (1) 

where i? is a regularising energy that models a-priori knowledge about the image d(', •) is a suitable distance 
function that models the relation of the data / to the unknown and a > 0 is a parameter that balances our 


1 


trust in the forward uiodel against the need of regularisation. The parameter a in particular, depends on the 
amount of ill-posedness in the operator K and the amount (amplitude) of the noise present in /. A key issue in 
imaging inverse problems is the correct choice of a, image priors (regularisation functionals i?), fidelity terms 
d and (if applicable) the choice of what to measure (the linear or nonlinear operator K). Depending on this 
choice, different reconstruction results are obtained. 

Several strategies for conceiving optimization problems have been considered. One approach is the a-priori 
modelling of image priors, forward operator K and distance function d. Total variation regularisation, for 
instance, has been introduced as an image prior in m due to its edge-preserving properties. Its reconstruction 
qualities have subsequently been thoroughly analysed in works of the variational calculus and partial differential 
equations community, e.g. m Eg m El HU El Eg m] only to name a few. The forward operator in magnetic 
resonance imaging (MRI), for instance, can be derived by formalising the physics behind MRI which roughly 
results in K = ST a sampling operator applied to the Fourier transform. Appropriate data fidelity distances 
d are mostly driven by statistical considerations that model our knowledge of the data distribution gang]. 
Poisson distributed data, as it appears in photography [29] and emission tomography applications [82], is 
modelled by the Kullback-Leibler divergence ca, while a normal data distribution, as for Gaussian noise, results 
in a least squares fit model. In the context of data driven learning approaches we mention statistically grounded 
methods for optimal model design m and marginalization naEsi, adaptive and multiscale regularization 
(zgESEn], learning in the context of sparse coding and dictionary learning IilE3ESllHSlEi, learning image 
priors using Markov Random fields [TgiTTIIM]. deriving optimal priors and regularised inversion matrices by 
analysing their spectrum and many recent approaches that - based on a more or less generic model setup 

such as 0 - aim to optimise operators (i.e., matrices and expansion) and functions (i.e. distance functions d) in 
a functional variational regularisation approach by bilevel learning from ‘examples ’ 01 [311 111 SllTlIlll [331132], 
among others. All these approaches vary in their philosophy and mathematics. The main dividing line is 
between model-based derivation of 0 that uses a-priori knowledge on the data and the image, and data-based 
derivation of 0 that learns the setup of the model from the data itself. 

While functional modelling constitutes a mathematically rigorous and physical way of setting up the re¬ 
construction of an image - providing reconstruction guarantees in terms of error and stability estimates - it is 
limited with respect to its adaptivity for real data. On the other hand, data-based modelling of reconstruction 
approaches is set up to produce results which are optimal with respect to the given data. However, in general 
it neither offers insights into the structural properties of the model nor provides comprehensible reconstruction 
guarantees. Indeed, we believe that for the development of reliable, comprehensible and at the same time 
effective models 0 it is essential to aim for a unified approach that seeks tailor-made regularisation and data 
models by combining model- and data-based approaches. 

To do so we focus on a bilevel optimisation strategy for finding an optimal setup of variational regularisation 
models 0 - That is, given a set of training images we find a setup of 0 which minimises an a-priori determined 
cost functional F measuring the performance of 0 with respect to the training set, compare Section for 
details. The setup of 0 can be optimised for the choice of regularisation R as will be discussed in Section 
for the data fitting distance d as in Section or for an appropriate forward operator K as in blind image 
deconvolution m for example. In the present article, rather than working on the discrete problem, as is done 
in standard parameter learning and model optimisation methods, we discuss the optimisation of variational 
regularisation models in infinite dimensional function space. We will explain this approach in more detail in the 
next section. Before, let us give an account to the state of the art of bilevel optimisation for model learning. In 
machine learning bilevel optimisation is well established. It is a semi-supervised learning method that optimally 
adapts itself to a given dataset of measurements and desirable solutions. In [701 EH El ng, for instance the 
authors consider bilevel optimization for finite dimensional Markov random field models. In inverse problems 
the optimal inversion and experimental acquisition setup is discussed in the context of optimal model design 
in works by Haber, Horesh and Tenorio gaiii], as well as Ghattas et al. mM- Recently parameter learning 
in the context of functional variational regularisation models 0 also entered the image processing community 
with works by the authors [SI EHl El El in El, Kunisch, Pock and co-workers min El, Ghung et al. 
and others jlEl- Interesting recent works also include bilevel learning approaches for image segmentation m 
and learning of support vector machines m 

Apart from the work of the authors m [H E3 El ES m, all approaches so far are formulated and 
optimised in the discrete setting. In what follows, we review modelling, analysis and optimisation of bilevel 
learning approaches in function space rather than on a discretisation of 0 . While digitally acquired image 
data is of course discrete, the aim of high resolution image reconstruction and processing is always to compute 
an image that is close to the real (analogue, infinite dimensional) world. HD photography produces larger 
and larger images with a frequently increasing number of megapixels, compare Figure Hence, it makes 
sense to seek images which have certain properties in an infinite dimensional function space. That is, we aim 
for a processing method that accentuates and preserves qualitative properties in images independent of the 
resolution of the image itself [HS]. Moreover, optimisation methods conceived in function space potentially 
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Figure 1: Camera technology tending towards continuum images? Most image processing and analysis algo¬ 
rithms are designed for a finite number of pixels. But camera technology allows to capture images of higher 
and higher resolution and therefore the number of pixels in images changes constantly. Functional analysis, 
partial differential equations and continuous optimisation allow us to design image processing models in the 
continuum. 


result in numerical iterative schemes which are resolution and mesh-independent upon discretisation |44| . 

Outline of the paper In what follows we focus on bilevel learning of an optimal variational regularisation 
model in function space. We give an account on the analysis for a generic learning approach in infinite dimen¬ 
sional function space presented in [33] in Section]^ In particular, we will discuss under which conditions on the 
learning approach, in particular the regularity of the variational model and the cost functional, we can indeed 
prove existence of optimal parameters in the interior of the domain (guaranteeing compactness), and derive an 
optimal system exemplarily for parameter learning for total variation denoising. Section [^discusses the numer¬ 
ical solution of bilevel learning approaches. Here, we focus on the second-order iterative optimisation methods 
such as quasi and semismooth Newton approaches m, which are combined with stochastic (dynamic) sampling 
strategies for efficiently solving the learning problem even in presence of a large training set m- In Sections 
and we discuss the application of the generic learning model from Section to conceiving optimal regu¬ 
larisation functionals (in the simplest setting this means computing optimal regularisation parameters; in the 
most complex setting this means computing spatially dependent and vector valued regularisation parameters) 
[32l |25] and optimal data fidelity functions in presence of different noise distributions isunni. 


2 The learning model and its analysis in function space 


2.1 The abstract model 


Our image domain will be an open bounded set C with Lipschitz boundary. Our data / lies in Y = 
M"^). We look for positive parameters A = (Ai,..., Am) and a = (ai,..., q^at) in abstract parameters sets 
and They are intended to solve for some convex, proper, weak* lower semicontinuous cost functional 
F : X ^ M the problem 

min F{ua,x) s.t. ^ arg min J(rt; A, a), (P) 

u^x 


for 


M „ 

J{u;X,a):=2. / Xi{x)(l)i{x,[Ku]{x)) dx 

i=i 



OLj{x) d\Aju\{x). 


Our solution u lies in an abstract space X, mapped by the linear operator K to Y. Several further technical 
assumptions discussed in detail in [33] cover A, X, and the (pi. In Section 2.2 of this review we concentrate on 
specific examples of the framework. 

For the approximation of problem 0 we consider various smoothing steps. For one, we require Huber reg¬ 
ularisation of the Radon norms. Secondly, we take a convex, proper, and weak* lower-semicontinous smoothing 
functional H : X ^ [0,oc]. The typical choice that we concentrate on is H{u) = |||Vr4|p. 
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For parameters /i > 0 and 7 G (0, 00], we then consider the problem 

min s.t. i^a,A, 7 ,/x ^ argmin J^^^{u;X,a) 

uEXndomiiH 


(p7,M) 


A, (a) := fiH{u) + / \i{x)(j)i{x^ [Ku]{x)) dx + (i| 74 ji 4 |^(x). 

i=i i=i 

Here we denote by the Huberised total variation measure per the following definition. 


Definition 2.1. Given 7 G ( 0 ,oc], we define for the norm || • II 2 on the Huber regularisation 


\9\i ~ 


\\9h-^. 

ilblli, 


llfi-lb > 1/7, 

II 5 II 2 < 1/7- 


Then if z/ = fC^ + is the Lebesgue decomposition of z/ G into the absolutely continuous part fjC^ 

and the singular part z/^, we set 

WUiV):= [ \f{x)Udx+W^\{V), {VeBrn. 

Jv 

The measure |z/|^ is the Huber-regularisation of the total variation measure \v\. 

In all of these, we interpret the choice 7 = oc to give back the standard unregularised total variation measure 
or norm. 


2.2 Existence and structure: L^-squared cost and fidelity 


We now choose 


F{u) = -\\Ku-f4l, 


and 


= -\f{x) -v\- 


( 2 ) 


with M = 1 . We also take = {I}? we do not look for the fidelity weights. Our next results state for 
specific regularisers with discrete parameters a = (cti,..., ctw) ^ = [ 0 ,oo]^, conditions for the optimal 

parameters to satisfy ct > 0 . Observe how we allow infinite parameters, which can in some cases distinguish 
between different regularisers. 

We note that these results are not a mere existence results; they are structural results as well. If we had an 
additional lower bound 0 < c < a in Q, we could without the conditions ^ for TV and Q for TGV^ [10] 
denoising, show the existence of an optimal parameter a. Also with fixed numerical regularisation (7 < 00 and 
/i > 0), it is not difficult to show the existence of an optimal parameter without the lower bound. What our 
very natural conditions provide is existence of optimal interior solution a > 0 to ( [P|) without any additional 
box constraints or the numerical regularisation. Moreover, the conditions @ and^Q guarantee convergence 
of optimal parameters of the numerically regularised problems ( P^’^[ ) to a solution of the original BV(f^) 
problem 0 - 

Theorem 2.2 (Total variation Gaussian denoising Suppose /,/o € BV(fl) ni^(n), and 


TV(/) > TV(/o). 


(3) 


Then there exist /u, 7 > 0 such that any optimal solution G [0, 00 ] to the problem 

min ,h/o-«a|li2(n) 

aG[0,ooJ Z ^ ' 

with 

e argmin(-||/-u|||2(n) + a\Du\^{n) + y||Vu|||2(n;En)) 

satisfies > 0 whenever /i G [0,/l], 7 G [7, 00]. 

This says that for the optimal parameter to be strictly positive, the noisy image / should, in terms of the 
total variation, oscillate more than the noise-free image /o - exactly what we would naturally expect! 

First steps of proof: modelling in the abstract framework. The modelling of total variation is is based on the 
choice of K as the embedding of V = BV(fl) fl L‘^{Q) into Y = and Ai = D. For the smoothing term 

we take H{u) = 11| V'z;||^ 2 ('^.]^np For the rest of the proof we refer to [33]. □ 
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Theorem 2.3 (Second-order total generalised variation Gaussian denoising [ 33 ]). Suppose that the data /, /o G 
L‘^{yt) n BV(^l) satisfies for some 02 > 0 the condition 

TGV^.„i)(/)>TGV^a.,i)(/o)- (4) 

Then there exists /i,7 > 0 such any optimal solution to the problem 


with 

{va,Wa) e argmin (h/- + ai\Dv - w\^{Q.) + a2\Ew\^{n) 

veBV{Q) 

wGBD(O) 

satisfies (<a ^,^)2 > 0 whenever /i G [0,/2], 7 G [7,0c]. 

Here we recall that BD(f]) is the space of vector fields of bounded deformation [ 78 ]. Again, the noisy data 
has to oscillate more in terms of TGV^ than the ground-truth does, for the existence of an interior optimal 
solution to ©• This of course allows us to avoid constraints on a. 

Observe that we allow for infinite parameters a. We do not seek to restrict them to be finite, as this will 
allow us to decide between TGV^, TV, and TV^ regularisation. 

First steps of proof: modelling in the abstract framework. To present TGV^ in the abstract framework, we take 
take X = (BV(fl) x BD(f^), and Y = L‘^{Q). We denote u = {v,w), and set 

K{v,w)=v, Aiu = Dv — w^ and A2U = Ew 


for E the symmetrised differential. For the smoothing term we take 

H{u) = - ||(Vr’, Vre)|||2('f^.]^nxRriXn). 

For all the gory details we again point the reader to | 33 | . 


□ 


We also have a result on the approximation properties of the numerical models as 7 ^ 00 and /i \ 0 . 
Roughly, the the outer semicontinuity [ 69 ] of the solution map S in the next theorem means that as the 
numerical regularisation vanishes, any optimal parameters for the regularised models (P^’^) tend to some 
optimal parameters of the original model 0- 


Theorem 2.4 ([ 33 ]). In the setting of Theorem | 2 . 2 | and Theorem 2 . 3 , there exist 7 G ( 0 , 00) and /i G ( 0 , oc) 
such that the solution map 

(7, /u) H> 

is outer semicontinuous within [7,00] x [0,/r]. 

We refer to [ 33 ] for further, more general results of the type in this section. These include analogous of the 
above ones for a novel Huberised total variation cost functional. 


2.3 Optimality conditions 

In order to compute optimal solutions to the learning problems, a proper characterization of them is required. 
Since 


(P 




constitute PDF-constrained optimisation problems, suitable techniques from this field may be 
utilized. For the limit cases, an additional asymptotic analysis needs to be performed in order to get a sharp 
characterization of the solutions as 7 ^ oc or /i ^ 0, or both. 

Several instances of the abstract problem (P^’^) have been considered in previous contributions. The case 
with Total Variation regularization was considered in m in presence of several noise models. There the 
Gateaux differentiability of the solution operator was proved, which lead to the derivation of an optimality 
system. Thereafter an asymptotic analysis with respect to 7 ^ 00 was carried out (with /i > 0 ), obtaining an 
optimality system for the corresponding problem. In that case the optimisation problem corresponds to one 
with variational inequality constraints and the characterization concerns C-stationary points. 

Differentiability properties of higher order regularisation solution operators were also investigated in [ 32 ]. A 
stronger Frechet differentiability result was proved for the TGV^ case, which also holds for TV. These stronger 
results open the door, in particular, to further necessary and sufficient optimality conditions. 
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For the general problem ), using the Lagrangian formalism the following optimality system is obtained: 


/i / (Vi^, V'i;) dx + ^ / Xi(l)[{Ku)Kv dx 

Jn Jq 


N 

■ / aj{h^{Aju), Ajv) dx = 0, \/v e V, (5) 


f M ^ 

II / (Vp,Vi;)dx +V / {Xi(l)-{Ku)Kp,Kv)dx 

Jn Jn 


N 

o‘j{h'*{Aju)Ajp,Aj 

v) dx = —F'{u)v, 

Vv G V, (6) 

Xi) dx > 0, 

VC > 0, 1 = 1,... 

,M, 

(7) 

■ aj) dx > 0, 

, Vr? > 0, i = 1,.. 

..,N, 

(8) 


[ UKu)Kp{ 

Jn 

/ h^{Aju)Ajp 

Jn 

where V stands for the Sobolev space where the regularised image lives (typically a subspace of 
with suitable homogeneous boundary conditions), p G V stands for the adjoint state and is a regularized 
version of the TV subdifferential, for instance, 


h^{z) := < 


^ if 71^1 - 1 > ^ 

^(i-i(i-7|^| + ^)2) if 7N-ie(-^,^) 
iz if 7|z| - 1 < 


(9) 


This optimality system is stated here formally. Its rigorous derivation has to be justified for each specific 
combination of spaces, regularisers, noise models and cost functionals. 

With help of the adjoint equation also gradient formulas for the reduced cost functional J^{X^a) := 
^('^a,A 5 ^5 are derived: 


(VA*F)i = / (j)i{Ku)Kpdx, {SaJ^)j = / h^{Aju)Ajpdx, 

Jn Jn 


( 10 ) 


for i = 1,..., M and j = 1,..., V, respectively. The gradient information is of numerical importance in the 
design of solution algorithms. In the case of finite dimensional parameters, thanks to the structure of the 
minimisers reviewed in Section 2, the corresponding variational inequalities 0 and ([^ turn into equalities. 
This has important numerical consequences, since in such cases the gradient formulas (10) may be used without 
additional projection steps. This will be commented in detail in the next section. 


3 Numerical optimisation of the learning problem 


3.1 Adjoint based methods 

The derivative information provided through the adjoint equation 0 may be used in the design of efficient 
second-order algorithms for solving the bilevel problems under consideration. Two main directions may be 
considered in this context: Solving the original problem via optimisation methods [IlISllEg, and solving the 
full optimality system of equations [SH [25] . The main advantage of the first one consists in the reduction of the 
computational cost when a large image database is considered (this issue will be treated in detail below). When 
that occurs, the optimality system becomes extremely large, making it difficult to solve it in a manageable 
amount of time. The advantage of the second approach, on the other hand, consists in the possibility of using 
efficient (possibly generalized) Newton solvers, which have been intensively developed in the last years. 

Let us first describe the quasi-Newton methodology considered in [mEg and further developed in [32]. For 
the design of a quasi-Newton algorithm for the bilevel problem with, e.g., one noise model (Ai = 1), the cost 
functional has to be considered in reduced form as J^{a) := F{ucx^a)^ where Ucx is implicitly determined by 
solving the denoising problem 


Ua = arg min 
uev 



/i > 0. 


( 11 ) 
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Using the gradient formula for T^ 


= [ h^{Aju)Ajpdx, j 

Jn 




the BFGS matrix may be updated with the classical scheme 

Bk^k G BkSk Zk G Zk 


Bk+i = Bk — 


{BkSk')^k) {Zk'^Sk) 


( 12 ) 


(13) 


where Sk = Zk = and {w 0 v)ip := {v^ ip)w. For the line search strategy, 

a backtracking rule may be considered, with the classical Armijo criteria 

+ tkd^^^) - p G (0,1], (14) 

where stands for the quasi-Newton descent direction and tk the length of the quasi-Newton step. We 
consider, in addition, a cyclic update based on curvature verification, i.e., we update the quasi-Newton matrix 
only if the curvature condition (zk^Sk) > 0 is satisfied. The positivity of the parameter values is usually 
preserved along the iterations, making a projection step superfluous in practice. In more involved problems, 
like the ones with TGV^ or ICTV denoising, an extra criteria may be added to the Armijo rule, guaranteeing 
the positivity of the parameters in each iteration. Experiments with other line search rules (like Wolfe) have also 
been performed. Although these line search strategies automatically guarantee the satisfaction of the curvature 
condition (see, e.g., IH21), the interval where the parameter tk has to be chosen appears to be quite small and 
is typically missing. 

The denoising problems 0 may be solved either by efficient first- or second-order methods. In previous 
works we considered primal-dual Newton type algorithms (either classical or semismooth) for this purpose. 
Specifically, by introducing the dual variables qi, i = 1 ,...,^", a necessary and sufficient condition for the 
lower level is given by 


fi / (yu,Vv)dx^'S^ / {qi,Aiv)dx^ / {(j)'{u),v) dx = 

Jq Jq Jn 

qi = ai h^{Aiu) a.e. in U, i = I, 


.,...,7V, 


yv e U, (15) 

(16) 


where h^{z) := rnax(i% \z\) ^ regularized version of the TV subdifferential, and the generalized Newton step 

has the following Jacobi matrix 


T + 0"(^) 

-ai ^{A^u) - 

-a^[9T(A^^)-Xiv -"ff4r 






A\ 

I 

0 

0 


0 

0 

I 


(17) 


where L is an elliptic operator, Xi(x) is the indicator function of the set {x : ^\Aiu\ >1} and Tt(A^i4) := 
, for i = I,..., V. In practice, the convergence neighbourhood of the classical method is too small 
and some sort of globalization is required. Following [33] a modification of the matrix was systematically 
considered, where the term is replaced by max(^g | a ) ^ \A^u\‘^ ' resulting algorithm exhibits both 

a global and a local superlinear convergent behaviour. 

For the coupled BFGS algorithm a warm start of the denoising Newton methods was considered, using the 
image computed in the previous quasi-Newton iteration as initialization for the lower level problem algorithm. 
The adjoint equations, used for the evaluation of the gradient of the reduced cost functional, are solved by 
means of sparse linear solvers. 

Alternatively, as mentioned previously, the optimality system may be solved at once using nonlinear solvers. 
In this case the solution is only a stationary point, which has to be verified a-posteriori to be a minimum of the 
cost functional. This approach has been considered in [53] and [25] for the finite- and infinite-dimensional cases, 
respectively. The solution of the optimality system also presents some challenges due to the nonsmoothness of 
the regularisers and the positivity constraints. 

For simplicity, consider the bilevel learning problem with the TV-seminorm, a single Gaussian noise model 
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and a scalar weight a. The optimality system for the problems reads as follows 


li / (Vu^Vv)dx^ / ah.y{Vu)Vv dx ^ / {u — f)v dx = e V, 

t/ O O t/ o 

li / (Vp, V'i;)dx+ / (a(h!^*(Vi4)Vp, Vi;) (ix+ / pvdx 

Jq Jq Jq 


= -F'{u)v, \/v G y, 


Cr = / {hry{Vu)^Vp) dx. 

Jn 

cr > 0, > 0, cr • a = 0. 


(18a) 

(18b) 

(18c) 

(18d) 


where is given by, e.g., equation The Newton iteration matrix for this coupled system has the following 
form: 

/ L+V*Q;(*)y(Vw'')V 0 V*h^{Vu’^) 

V*q;(*)/i"(Vw*) VpV + F"(u'=) L + V*Q;('')y (Vu'')V V*y(Vu'')Vp 

V h'^{Vu'^)VpV h^{Vvi)V 0 

The structure of this matrix leads to similar difficulties as for the denoising Newton iterations described above. 
To fix this and get good convergence properties, Kunisch and Pock m proposed an additional feasibility step, 
where the iterates are projected on the nonlinear constraining manifold. In [25], similarly as for the lower 
level problem treatment, modified Jacobi matrices are built by replacing the terms h'^{uk) in the diagonal, 
using projections of the dual multipliers. Both approaches lead to globally convergent algorithm with locally 
superlinear convergence rates. Also domain decomposition techniques were tested in [25] for the efficient 
numerical solution of the problem. 

By using this optimize-then-discretize framework, resolution independent solution algorithms may be ob¬ 
tained. Once the iteration steps are well specified, both strategies outlined above use a suitable discretization 
of the image. Typically a finite differences scheme with mesh size step h > 0 is used for this purpose. The 
minimum possible value of h is related to the resolution of the image. For the discretization of the Laplace 
operator the usual five point stencil is used, while forward and backward finite differences are considered for the 
discretization of the divergence and gradient operators, respectively. Alternative discretization methods (finite 
elements, finite volumes, etc) may be considered as well, with the corresponding operators. 



3.2 Dynamic sampling 


For a robust and realistic learning of the optimal parameters, ideally, a rich database of K images, LC ^ 1 


should be considered (like, for instance, MRI applications, compare Section 5.1). Numerically, this consists 


in solving a large set of nonsmooth PDE-constraints of the form (15)- (16) in each iteration of the BFGS 
optimisation algorithm 

In [18] we extended to our imaging framework a dynamic sample size stochastic approximation method 
proposed by Byrd et al. [16]. The algorithm starts by selecting from the whole dataset a sample S whose size 
\S\ is small compared to the original size K. In the following iterations, if the approximation of the optimal 
parameters computed produces an improvement in the cost functional, then the sample size is kept unchanged 
and the optimisation process continues selecting in the next iteration a new sample of the same size. Otherwise, 
if the approximation computed is not a good one, a new, larger, sample size is selected and a new sample S of 
this new size is used to compute the new step. The key point in this procedure is clearly the rule that checks 
throughout the progression of the algorithm, whether the approximation we are performing is good enough, 
i.e. the sample size is big enough, or has to be increased. Because of this systematic check, such sampling 
strategy is called dynamic. Denoting by the solution of (p!^-(16) and by /q the ground-truth images for 
every /c = 1,..., iF, we consider now the reduced cost functional iF{a) in correspondence of the whole database 




K 


2K 


twi^, 


k=l 


we consider, for every sample S C {l,...,Lr}, the batch objective function: 

-^s(a) :=^ElK-/olli- 

I I kes 

As in [16], we formulate in [18] a condition on the batch gradient \/J-s which imposes in every stage of the 
optimisation that the direction —\/Fs is a descent direction for at a if the following condition holds: 


IIVJ-s(^) - V^(a)|U2 < ^||VJ-5 (^)||l^, 0 e [0,1). 


(19) 
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Algorithm 1 Dynamic Sampling BFGS 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 


Initialize: ao, sample <So with \So\ K and model parameter 0, k = 0. 
while BFGS not converging, k > 0 
sample \Sk\ PDF constraints to solve; 
update the BFGS matrix; 

compute direction dk by BFGS and steplength tk by Armijo cond. (14); 
define new iterate: a/c+i = afc + tkdk’, 
if variance condition is satisfied then 
maintain the sample size: IaS/c+iI = \Sk\] 
else augment Sk such that condition variance condition is verified, 
end 



(a) original (b) noisy (c) R(u) = ||Vi^||i (d) R(u) = |Di^|(Q) 


Figure 2: The effect of the choice of regularisation in 0: Choosing the norm squared of the gradient of u as 
a regulariser imposes isotropic smoothing on the image and smoothes the noise equally as blurring the edges. 
Choosing the total variation (TV) as a regulariser we are able to eliminate the noise while preserving the main 
edges in the image. 


The computation of may be very expensive for applications involving large databases and nonlinear 
constraints, so we rewrite (19) as an estimate of the variance of the random vector \/J-s{o:)- We do not report 
here the details of the derivation of such estimate, but we refer the interested reader to m Section 2]. Here, 
we just underline that through such a condition on the variance one can control in each iteration of the BFGS 
optimisation whether the sampling approximation is accurate enough and, if this is not the case, a new larger 
sample size may be determined in order to reach the desired level of accuracy, depending on the parameter 0 
in (19). 

In order to improve upon the traditional slow convergence drawback of steepest descent methods, we com¬ 
bined the Dynamic Sampling strategy described above with BFGS method (13), as described in Algorithm 

m 


4 Learning the image model 

One of the main aspects of discussion in the modelling of variational image reconstruction is the type and 
strength of regularisation that should be imposed on the image. That is, what is the correct choice of regularity 
that should be imposed on an image and how much smoothing is needed in order to counteract imperfections in 
the data such as noise, blur or undersampling. In our variational reconstruction approach 0 this boils down to 
the question of choosing the regulariser R(u) for the image function u and the regularisation parameter a. In 
this section we will demonstrate how functional modelling and data learning can be combined to derive optimal 
regularisation models. To do so, we focus on Total Variation (TV) type regularisation approaches and their 
optimal setup. The following discussion constitutes the essence of our derivations in [32], including an extended 
numerical discussion with an interesting application of our approach to cartoon-texture decomposition. 

4.1 Total variation type regularisation 

The TV is the total variation measure of the distributional derivative of [3], that is for u defined on ft 

TV{u) = \Du\{n) = [ d\Du\. (20) 

Jn 

As the seminal work of Rudin, Osher and Fatemi m and many more contributions in the image process¬ 
ing community have proven, a non-smooth first-order regularisation procedure as TV results in a nonlinear 
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Figure 3: TV image denoising and the staircasing effect: (1.) noisy image, (m.) denoised image, (r.) detail of 
the bottom right hand corner of the denoised image to visualise the staircasing effect (the creation of blocky-like 
patterns due to the first-order regulariser). 


smoothing of the image, smoothing more in homogeneous areas of the image domain and preserving character¬ 
istic structures such as edges, compare Figure ?? More precisely, when TV is chosen as a regulariser in Q 


the reconstructed image is a function in BV the space of functions of bounded variation, allowing the image to 
be discontinuous as its derivative is defined in the distributional sense only. Since edges are discontinuities in 
the image function they can be represented by a BV regular image. In particular, the TV regulariser is tuned 
towards the preservation of edges and performs very well if the reconstructed image is piecewise constant. 

Because one of the main characteristics of images are edges as they define divisions between objects in a 
scene, the preservation of edges seems like a very good idea and a favourable feature of TV regularisation. 
The drawback of such a regularisation procedure becomes apparent as soon as images or signals (in ID) are 
considered which do not only consist of constant regions and jumps, but also possess more complicated, higher- 
order structures, e.g. piecewise linear parts. The artefact introduced by TV regularisation in this case is called 
staircasing [SH], compare Figure]^ 

One possibility to counteract such artefacts is the introduction of higher-order derivatives in the image regu¬ 
larisation. Here, we mainly concentrate on two second-order total variation models: the recently proposed Total 
Generalized Variation (TGV) [10] and the Infimal-Convolution Total Variation (ICTV) model of Chambolle 
and Lions [21]. We focus on second-order TV regularisation only since this is the one which seems to be most 
relevant in imaging applications EaE]. For C open and bounded, the ICTV regulariser reads 


lCTYa,i3{u) min ^(^\\Du — Vv\\m(^q.^ 2 ^^\\DVv\\^(^Q.^ 2 y 

VveBV(n) 

On the other hand, second-order TGV mm reads 

TGV^_^(m) := min Q;||L)M-«;||^(n;R2) +/?||-B«;||^(n;Sym2(E2)). 

VU ^ -D ( u 6 ) 


b- 


( 21 ) 


( 22 ) 


Here BD(f]) := {w G \ \\Ew\\M{n-,R^x^) < co} is the space of vector fields of bounded deformation on 

E denotes the symmetrised gradient and Sym^(M^) the space of symmetric tensors of order 2 with arguments 
in The parameters (a,/3 are fixed positive parameters. The main difference between ( [21] ) and ( [2^ is that 
we do not generally have that re = Vu for any function v. That results in some qualitative differences of 
ICTV and TGV regularisation, compare for instance [6]. Substituting aR{u) in 0 by aTV{u), TGV^^^(m) or 
lCTya,g{u) gives the TV image reconstruction model, TGV image reconstruction model and the ICTV image 
reconstruction model, respectively. 


4.2 Optimal parameter choice for TV type regularisation 

The regularisation effect of TV and second-order TV approaches as discussed above heavily depends on the 
choice of the regularisation parameters a (i.e. ((a,/3) for second-order TV approaches). In Figures]^ andwe 
show the effect of different choices of a and p in TGV^ denoising. In what follows we show some results from 
|32| applying the learning approach to find optimal parameters in TV type reconstruction models, as 


well as a new application of bilevel learning to optimal cartoon-texture decomposition. 


Optimal TV, TGV^ and ICTV denoising We focus on the special case oi K = Id and L^-squared cost 
E and fidelity term ^ as introduced in Section 2.2 In [33] |32] we also discuss the analysis and the effect of 


Huber regularised costs, but this is beyond the scope of this paper and we refer the reader to the respective 
papers. We consider the problem for finding optimal parameters for the variational regularisation model 


ia,/3) G argmini?(„,^)('u) + ||m -/||i2(n), 

u^X 


where / is the noisy image, R(a,g) is either TV in (20) multiplied by a (then f3 is obsolete), TGV^^,^^) in (22) 
or ICTV(^a,i 3 ) ill ( [^Tj ). We employ the framework of (P^’^) with a training pair (/o, /) of original image /o and 
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(a) Too low /3 / High oscillation (b) Optimal [3 (c) Too high (3 / almost TV 

Figure 4: Effect of P on TGV^ denoising with optimal a 



(a) Too low a, low /3. (b) Too low a, optimal (3. (c) Too high o, high (3. 

Good match to noisy data optimal TV^-like behaviour Bad TV^-like behaviour 


Figure 5: Effect of choosing a too large in TGV^ denoising 


noisy image /, using L^-squared cost E^2(u) := ^||/o — As a first example we consider a photograph 

of a parrot to which we add Gaussian noise such that the PSNR of the parrot image is 24.72. In Figure 
we plot by the red star the discovered regularisation parameter ((a*,/3*) reported in Figure]^ Studying the 
location of the red star, we may conclude that the algorithm managed to find a nearly optimal parameter in 
very few BFGS iterations, compare Table 

Optimizing cartoon-texture decomposition using a sketch It is not possible to distinguish noise from 
texture by the G-norm and related approaches [58]. Therefore, learning an optimal cartoon-texture decompo¬ 
sition based on a noise image and a ground-truth image is not feasible. What we did instead, is to make a 
hand-drawn sketch as our expected “cartoon” /o, and then use the bi-level framework to find the true “cartoon” 
and “texture” as split by the model 

J{u,v;a) = ^\\f - u - v\f + q;i||v||kr + a2TV(u) 

for the Kantorovich-Rubinstein norm of (55). For comparison we also include basic TV regularisation results, 
where we define v = f — u. The results for two different iages are in Figure and Table and Figure and 
Table respectively. 


Table 1: Quantified results for the parrot image {£ = 256 = image width/height in pixels) 


Denoise 

Cost 

Initial (a, /3) 

Result (q;*,/3*) 

Cost 

SSIM 

PSNR 

Its. 

Fig. 

TGV^ 

L| 

^Tv) 

(0.058/^^,0.041/^) 

6.412 

0.890 

31.992 

11 


7 

b| 

ICTV 



(0.051/^2 0.041/Q 

6.439 

0.887 

31.954 

7 



c 

TV 

Ll 

0.1 A 

0.042/e 

6.623 

0.879 

31.710 

12 


7 

a| 


11 















































Figure 6: Cost functional value for the 
cost functional plotted versus (<a, /d) for 
TGV^ denoising. The illustration is a con¬ 
tour plot of function value versus ((a,/3). 


Table 2: Quantified results for cartoon-texture decomposition of the parrot image {i = 256 = 

image width/height in pixels) 


Denoise 

Cost 

Initial a 

Result a* 

Value 

SSIM 

PSNR 

Its. 

Fig. 

KRTV 

Li 

(«Tv/®Tv) 

0 . 006 /i 

81.245 

0.565 

9.935 

11 


8 


TV 

Li 

0.1/e 

0.311/^ 

81.794 

0.546 

9.876 

7 


5 

1 


5 Learning the data model 

The correct mathematical modelling of the data fidelity terms (pi^i = 1,..., M in 0 is crucial for the design 
of a realisitc denoising model. Their choice corresponds to physical and statistical properties of the noise 
distribution corrupting the ground-truth /o and varies significantly depending on applications. Typically, the 
noise is assumed to be additive, Gaussian-distributed with 0 mean and variance determining the noise 
intensity. This assumption is reasonable in most of the applications because of the Central Limit Theorem. 
However, there are cases where this modelling assumption does not correspond to the actual statistical properties 
characterising the physics of the application considered. For instance, when considering astronomical images, 
different physical properties corresponding to the quantised (discrete) nature of light and to the independence 
of photons detection lead to consider a Poisson noise distribution, which is signal dependent. Impulse noise 
seems to be more appropriate for modelling transmission errors affecting only some of the pixels in the image. 
For those pixels, the intensity value of the signal is switched to either the maximum/minimum value of the 
dynamic range of the image intensity or to a random value, with positive probability. 

For what follows, we will focus on these three noise distributions and on their possible combination. Other 
distributions can be considered as well: in general, they suit specific applications (like radar or medical ultra¬ 
sound images) where intrinsically the noise corrupting the image cannot be considered signal-independent. 

From a mathematical point of view, variational models reflecting the statistical properties of the noise have 
been derived for the design of consistent denoising models. Starting from the pioneering work of Rudin, Osher 
and Fatemi m, in the case of Gaussian noise a L^-type data fidelity p is typically considered. In the case of 
impulse noise, a variational model based on the use of the norm has been considered in m- statistically, 
this corresponds to consider a Laplace distribution. Poisson noise-based models have been considered in several 
papers by approximating such distribution with a weighted-Gaussian distribution through variance-stabilising 
techniques [73 [13]. In [75] a statistically-consistent analytical modelling for Poisson noise distributions has 
been derived: this results in a Kullback-Leibler-type fidelity. 

As a result of different physical factors, very often in applications the presence of different noise distributions 

Table 3: Quantified results for cartoon-texture decomposition of the Barbara image {i = 256 = 

image width/height in pixels) 


Denoise 

Cost 

Initial a 

Result d* 

Value 

SSIM 

PSNR 

Its. 

Fig. 

KRTV 

Li 

(«Tv/^j«Tv) 

0.423li 

97.291 

0.551 

8.369 

6 
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TV 

Li 

0 . 1 /e 

0.563/^ 

97.205 

0.552 

8.377 

7 


5 
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(a) Noisy image 


(b) TGV^ denoising, L2 cost 


(c) ICTV denoising, L| cost 


Figure 7: Optimal denoising results for initial guess a = 
for TGV^ and ICTV, and a = 0.1/^ for 
TV 


(a) TV denoising, cost 



has to be considered as well. In [43] a combined L^-L‘^ TV-based model is considered for impulse and Gaussian 
noise removal. A two-phase approach is considered in m where the selection of the jI? term is performed 
depending on the intensity of the noise. In general, though, the literature on these combined noise models is 
rather scarse. Gaussian-Poisson noise mixture has been considered in several papers from different point of 
views: in [JH] the exact log-likelihood estimator of the model is derived and then computed via a primal-dual 
splitting, while in other works (see, e.g., [38]) the discrete-continuous nature of the model (due to the Poisson- 
Gaussian component, respectively) is approximated by neglecting or modifying one of the two noise models, 
typically by means of variance-stabilising techniques or a weighted-L^ approximation. 


We now proceed differently from Section |2.2| and focus on the modelling of the optimal fidelity terms (pi 
best fitting the acquired data, providing some examples for the single and multiple noise estimation case. In 
particular, we focus on the estimation of the optimal fidelity weights \i,i = 1,... ,M appearing in © and 


(P'^’^), focu sing on the Total-Variation regularisation ( |20| ) only applied to denoising problems. Compared to 
Section 2.1, this corresponds to fix = {1} and K = Id. We base our presentation on [3T1[T8], where a careful 
analysis in term of well-posedness of the problem and derivation of the optimality system in this framework is 
carried out. 


Shorthand notation In order not to make the notation too heavy, we warn the reader that we will use a 
shorthand notation for the quantities appearing in the regularised problem ( P^’^| ), that is we will write 4>^(u) 
for the data fidelities (pi{x,v),i = 1 ..., M and u for i^a, 7 ,/x 5 the minimiser of J^’^(-; A). 


5.1 Single noise estimation 

In this section we consider the one-noise distribution case (M = 1) where we aim to determine the constant 
optimal fidelity weight A by solving the following optimisation problem: 

“>o ^11-^0 ““lib (23a) 
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(d) KRTV denoising, cost 


(b) Original image 


(c) Cartoon sketch 


(e) TV denoising, cost (f) Texture component for KRTV (g) Texture component for TV 


Qf^y) for KRTV and a 


Figure 8: Optimal sketch-based cartoonification for initial guess a 
for TV 


(c) KRTV denoising, cost 


(a) Original image 


(b) Cartoon sketch 


(d) TV denoising, cost (e) Texture component for KRTV (f) Texture component for TV 


Qf^y/^, Qf^y) for KRTV and a 


Figure 9: Optimal sketch-based cartoonification for initial guess a 
for TV 






subject to (compare 0) 


IJ.(Vu,\/{v — u))L 2 X / ^'{u){v — u)dx 

Jq 

+ [ \\\/v\\dx- [ ||V^||^ix>0 for 3l\v e (23b) 

Jq Jq 

where the fidelity term ^ will change according to the different noise distributions considered and the pair 
(/o, /) is the training pair composed by a noise-free and noisy version of the same image, respectively. 

Note that in the case the noise level is known there are classical techniques in inverse problems for choosing 
an optimal parameter A* in a variational regularisation approach, e.g. the discrepancy principle or the L- 
curve approach [36]. In our discussion we do not use any knowledge of the noise level but rather extract this 
information indirectly from our training set and translate it to the optimal choice of A. As we will see later 
such an approach is also naturally extendable to multiple noise models as well as inhomogeneous noise. 


Gaussian noise We start by considering (23) for determining the regularisation parameter in the standard 
TV denoising model assuming that the noise in the image is normally distributed. In this case the fidelity term 
reads <I>(i4) = \u — f\‘^. The optimisation problem 23 takes the following form: 


min 

A>0 




(24a) 


subject to: 

V(u — rt ))^2 + / X{u — f){v — u) dx 

Jq 

+ [ \\Vv\\dx- [ \\Vu\\dx>0,\fv e (24b) 

J Q J Q 

For the numerical solution of the regularised variational inequality we use a primal-dual algorithm presented 
in [44] . 

As an example, we compute the optimal parameter A* in for a noisy image distorted by Gaussian 
noise with zero mean and variance 0.02 . Results are reported in Figure The optimisation result has been 
obtained for the parameter values /i = le — 12,7 = 100 ^^0 ^ = 1/177. 




Figure 10: Noisy (left) and optimal denoised (right) image. Noise variance: 0.02. Optimal parameter A* = 
1770.9. 

In order to check the optimality of the computed regularisation parameter A*, we consider the 80 x 80 pixel 
bottom left corner of the noisy image in Figure IT^ In Figure fTT] the values of the cost functional and of the 

Signal to Noise Ratio SNR = 20 x logio ^ ^ > for parameter values between 150 and 1200, are plotted. 

Also the cost functional value corresponding to the computed optimal parameter A* = 885.5 is shown with a 
cross. It can be observed that the computed weight actually corresponds to an optimal solution of the bilevel 
problem. Here we have used h = 1/80 and the other parameters as above. 

The problem presented consists in the optimal choice of the TV regularisation parameter, if the original 
image is known in advance. This is a toy example for proof of concept only. In applications, this image would 
be replaced by a training set of images. 
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Figure 11: Plot of the cost functional value (left) and the SNR (right) vs. the parameter A. Parameters: the 
input is the 80 x 80 pixel crop of the bottom left corner of the noisy image in Figure h = 1/80, 7 = 100, /i = 
le — 12 . The red cross in the plot corresponds to the optimal A* = 885.5. 


K 

A* 


1*501 

1 Serid 1 

eff. 

eff. Dyn.S. 

BFGS its. 

BFGS its. Dyn.S. 

diff. 

10 

3334.5 

3427.7 

2 

3 

140 

84 

7 

21 

2.7% 

20 

3437.0 

3475.1 

4 

4 

240 

120 

7 

15 

1.1% 

30 

3436.5 

3478.2 

6 

6 

420 

180 

7 

15 

1.2% 

40 

3431.5 

3358.3 

8 

9 

560 

272 

7 

16 

2.1% 

50 

3425.8 

3306.4 

10 

10 

700 

220 

7 

11 

3.5% 

60 

3426.0 

3543.4 

12 

12 

840 

264 

7 

11 

3.3% 

70 

3419.7 

3457.7 

14 

14 

980 

336 

7 

12 

1 . 1 % 

80 

3418.1 

3379.3 

16 

16 

1120 

480 

7 

15 

< 1 % 

90 

3416.6 

3353.5 

18 

18 

1260 

648 

7 

18 

2.3% 

100 

3413.6 

3479.0 

20 

20 

1400 

520 

7 

13 

1.9% 


Table 4: Optimal A* estimation for large training sets: computational costs are reduced via Dynamic Sampling 


Algorithm 


Robust estimation with training sets Gaussian noise images typically arise within the framework of 
Magnetic Resonance Imaging (MRI). The challenge in this case consists in training the TV denoising method 
such that with one fixed optimally computed A* clearer images are obtained from noisy acquisitions taken on a 
single MR tomograph with fixed settings. MR images seem to be a natural choice for our methodology, since a 
training set of images is often at hand. Let us consider a training database {(/q , fk)}j^-i ^ 1 of clean 

and noisy images. We modify (24) as: 

K 


1 


mm 

A>0 




2K ^ 

k=l 


(25) 


subject to the set of regularised versions of (24b), for /c = 1,..., LC. 

As explained in [18], dealing with large training sets of images and non-smooth PDF constraints of the 


form (24b) may result is very high computational costs as, in principle, each constraint needs to be solved in 


each iteration of the optimisation loop. On the other hand, in MRI applications, a large database of images is 
desirable in order to make the optimal noise estimation robust. In order to overcome the computational efforts, 
we estimate A* using the Dynamic Sampling Algorithm 

For the following numerical tests, the parameters are chosen as follows: /i = le —12, 7 = 100 and h = 1/150. 
The noise in the images has distribution A/'(0, 0.005) and the accuracy parameter 0 of the Algorithm]^ is chosen 
to be = 0.5. 

Table 1^ shows the numerical values of the optimal parameter A* and AJ computed varying N after solving 
all the PDE constraints and using Dynamic Sampling algorithm, respectively. We measure the efficiency of 
the algorithms in terms of the number of the PDEs solved during the whole optimisation and we compare the 


efficiency of solving (25) subject to the whole set of constraints (24b) with the one where solution is computed 
by means of the Dynamic Sampling strategy, observing a clear improvement. Computing also the relative error 
W^s — Alli/IIA^I 111 we note a good level of accuracy: the error remains always below 5%. 

Figure [ 1 ^ shows an example of database of brain image^together with the optimal denoised version obtained 
by Algorithm for Gaussian noise estimation. 

In order to test the adaptability of our method to images which are very diverse between each other, we 
test our model for a very diversified database see Fig. From Table we can observe that increasing the 
size of the database, the estimation of the optimal parameter A* may vary significantly, due to the diversity of 
images considered. This reflects the property of our approach to estimate the parameter A* which is optimal 


with respect to the entire database, cf. cost functional (25). 


^OASIS online database, http://www.oasis-brains.org/. 

^Berkeley database, available online at: http://www.eecs.berkeley.edu/Research/Projects/CS/vision/bsds/BSDS300/html/dataset/images. 


16 



































Figure 12: Sample of 5 images of OASIS MRI brain database: original images (upper row), noisy images 
(middle row) and optimal denoised images (bottom row), = 3280.5. 



Figure 13: Noise-free and noisy versions of images from the Berkeley database. The Gaussian noise distribution 
is 0 mean and variance = 0.01. 


K 

10 

20 

30 

40 

X* 

2732.15 

2766.32 

2170.23 

2292.51 


Table 5: Optimal A* estimation for heterogeneous database, see Fig. 
diversity of the images considered. 
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The numerical value adapts to the 


Poisson noise As a second example, we consider the case of images corrupted by Poisson noise. The corre¬ 
sponding data fidelity in this case has been shown in to be a KL-type fidelity defined as ^(i^) = u — f log u, 
which requires the additional condition for u to be strictly positive. We enforce this constraint by using a 
standard penalty method and solve: 

mm l\\fo-u\\h 

where u is the solution of the minimisation problem: 


mm 

v>0 




/ logv) dx+ ^ 


|min(t>,(5)|||2| 


(26) 


where ^ 1 is a penalty parameter enforcing the positivity constraint and 6 1 ensures strict positivity 


throughout the optimisation. After Huber-regularising the TV term using (2.1), we write the primal-dual form 


of the corresponding optimality condition for the optimisation problem (26) similarly as in (15)-(16) 


f 

fiAu - divq + X {1 - -) + r]XT-, u = 0, 
u ^ 


q = 




max(7|Vi^l, 1) ’ 


(27) 


where 75 is the active set 75 ^ ^ • u{x) < (5}. We then design a modified SSN iteration solving (27) 

similarly as described in Section [3T| see mi Section 4] for more details. Figure shows the optimal denoising 
result for the Poisson noise case in correspondence of the value A* = 1013.76. 


Spatially dependent weight We continue with an example where A is spatially-dependent. Specifically, we 
choose as parameter space V = {u G : dnU = 0 on F} in combination with a TV regulariser and a single 

Gaussian noise model. For this example the noisy image is distorted non-uniformly: A Gaussian noise with 
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Figure 14: Poisson denoising: Original (left), noisy (center) and optimal denoised (right) images. Parameters: 
7 = le3, /i = le — 10, h = 1/128,77 = led. Optimal weight: A* = 1013.76. 


zero mean and variance 0.04 is present on the whole image and an additional noise with variance 0.06 is added 
on the area marked by red line. 

Since the spatially dependent parameter does not allow to get rid of the positivity constraints in an automatic 
way, we solved the whole optimality system by means of the semismooth Newton method described in Section 
3, combined with a Schwarz domain decomposition method. Specifically, we decomposed the domain first and 
apply the globalized Newton algorithm in each subdomain afterwards. The detailed numerical performance of 
this approach is reported in [25]. 

The results are shown in Figure [T^ for the parameters /i = le — 16, 7 = 25 and h = 1/500. The values of A 
on whole domain are between 100.0 to 400.0. From the right image in Figure we can see the dependence of 
the optimal parameter A* on the distribution of noise. As expected, at the high-level noise area in the input 
image, values of A* are lower (darker area) than in the rest of the image. 



Figure 15: Noisy image (left), denoised image (center) and intensity of A* (right). 


5.2 Multiple noise estimation 

In many applications, the acquired image may be possibly corrupted by different types of noise, each one 
corresponding to a different data fidelity term weighted by a non-negative weighting A^. In this multiple 
noise case, we consider the following optimisation lower level problem: 

min I 7 |Vu |||,2 + |Du|(0) + / ... ,^m{u)) dx 

“12 Jn 


where the modelling function T combines the different fidelity terms and weights A^ in order to deal with 
the multiple noise case. The case when T is a linear a linear combination of fidelities with coefficients Xi 
is the one presented in the general model 0 and ( |P^’^| ) and has been considered in m- In the following, we 


present also the case when T is an infimal-convolution operation of fidelities, as considered in m- 


Impulse and Gaussian noise Motivated by some previous work in literature on the use of the infimal- 
convolution operation (5] Chapter 16] for image decomposition, cf. [21] [15], we consider in m the modelling of 
mixed noise distribution through such operation with the intent of obtaining an optimal denoised image thanks 
to the decomposition of the noise into its different components. In the case of combined Gaussian and impulse 
noise, the optimisation model reads: 
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where u is the solution of the optimisation problem: 

mm {|||Vt;||i. + + Ai||n|U. + MWf - v - n||i.} , (28) 

neL‘^ 

where n represents the impulse noise component (and, as such, is treated using the norm) and the optimi¬ 
sation runs over v and n. We use once again a single training pair (/o, /) and consider a Huber-regularisation 
depending on a parameter 7 for both the TV term and the norm in ( [2^ . The corresponding Euler-Lagrange 
equations are: 


—fiAu — div 




max(7| Vrt|, 1) 
7 n 


Ai 


max(7|n|, 1) 


- A2 (/ - w - n) = 0, 

- A2 (/ - w - n) = 0. 


Again, writing the equations above in a primal-dual form, we can write the modified SSN iteration and solve 


the optimisation problem with BEGS as described in Section 3.1 


In Eigurep^we present the results of the model considered. The original image /o has been corrupted with 
Gaussian noise of zero mean and variance 0.005 and then a percentage of 5% of pixels has been corrupted with 
impulse noise. The parameters have been chosen to be 7 = le3, /i = le — 15 and the mesh step size h = 1/120. 
The computed optimal weights are AJ = 351.23 and A2 = 5200.1. The results show the actual decomposition 
of the noise into its sparse and Gaussian components. 



Eigure 16: Impulse-Gaussian denoising. Erom left to right: Original image, noisy image corrupted by impulse 
noise and Gaussian noise with mean zero and variance 0.005, denoised image, impulse noise residuum and 
Gaussian noise residuum. Optimal parameters: A^ = 351.23 and AJ = 5200.1. 


Gaussian and Poisson noise We consider now the optimisation problem with — /p for the 

Gaussian noise component and ^2('^) = {u — flogu) for the Poisson distributed one. We aim to determine the 
optimal weighting (Ai, A2) as follows: 

subject to u be the solution of: 

min {|||Vv||i2 + |£>n|(ll) + y ||n - fWl^ + A2 - flogv) dx^ , (29) 

for one training pair (/o,/), where / corrupted by Gaussian and Poisson noise. After Huber-regularising the 
Total Variation term in ( [^ , we derive (formally) the following Euler-Lagrange equation 

- fiAu - div (-+ Ai('u - /) + A2(1 - -a = 0 

\max(7|Vi4|, 1) / u 

a ' u = 0, 

with non-negative Lagrange multiplier a G L‘^{Q). As in m we multiply the first equation by u and obtain 

u ■ f-iiAu - div (- 7 V lid + Ai(u - /)) +X 2 {u- f)= 0, 

V Vmax(7|Vit|, 1)7 J 

where we have used the complementarity condition a • u = 0. Next, the solution u is computed iteratively by 
using a semismooth Newton type method combined with the outer BEGS iteration as above. 

In Eigure pT| we show the optimisation result. The original image /o has been first corrupted by Poisson 
noise and then Gaussian noise was added, with zero mean and variance 0.001. Choosing the parameter values 
to be 7 = 100 and /i = le — 15, the optimal weights A^ = 1847.75 and A2 = 73.45 were computed on a grid 
with mesh size step h = 1/200. 
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Figure 17: Poisson-Gaussian denoising: Original image (left), noisy image corrupted by Poisson noise and 
Gaussian noise with mean zero and variance 0.001 (center) and denoised image (right). Optimal parameters 
Af = 1847.75 and = 73.45. 


6 Conclusion and outlook 

Machine learning approaches in image processing and computer vision have mostly developed in parallel to their 
mathematical analysis counterparts, which have variational regularisation models at their core. Variational 
regularisation techniques offer rigorous and intelligible image analysis - which gives reliable and stable answers 
that provide us with insight in the constituents of the process and error estimates. This guarantee of giving 
a meaningful and stable result is crucial in most image processing applications, in biomedical and seismic 
imaging, in remote sensing and astronomy: provably giving an answer which is correct up to some error bounds 
is important when diagnosing patients, deciding upon a surgery or when predicting earthquakes. Machine 
learning methods, on the other hand, are extremely powerful as they learn from examples and are hence able 
to adapt to a specific task. The recent rise of deep learning gives us a glimpse on what is possible when 
intelligently using data to learn from. Todays (29 April 2015) search on a Google on ‘deep learning image’ just 
gave 59,800,000 hits. Deep learning is employed for all kinds of image processing and computer vision tasks, 
with impressive results! The weak point of machine learning approaches, however, is that they generally cannot 
offer stability or error bounds, neither provide most of them understanding about the driving factors (e.g. the 
important features in images) that led to their answer. 

In this paper we wanted to give an account to a recent realisation in mathematical image processing that a 
marriage between machine learning and variational regularisation might be interesting - an attempt to bring 
together the Good from both worlds. In particular, we have discussed bilevel optimisation approaches in which 
optimal image regularisers and data fidelity terms are learned making use of a training set. We discussed the 
analysis of such a bilevel strategy in the continuum as well as their efficient numerical solution by quasi-Newton 
methods, and presented numerical examples for computing optimal regularisation parameters for TV, TGV^ 
and ICTV denoising, as well as for deriving optimal data fidelity terms for TV image denoising for data 
corrupted with pure or mixed noise distributions. 

Although the techniques presented in this article are mainly focused on denoising problems, the perspectives 
of using similar approaches in other image reconstruction problems (inpainting, segmentation, etc.) appear to 
be promising. Also the extension to color images deserves to be further studied. 

Finally, there are still several open questions which deserve to be investigated in the future. Here a short 

list: 

• Is it possible to obtain an optimality system for (P) by performing an asymptotic analysis when /i ^ 0? 

• How to measure optimality? Are quality measures such as the signal-to-noise ratio and generalisations 
thereof [84] enough? Should one try to match characteristic expansions of the image such as Fourier or 
Wavelet expansions? m 

• How to decide about the presence of a specific noise model? Is it possible to use sparse optimization for 
the automatic identification of one specific model? Can it be used to identify mixed models? 

7 Acknowledgments 

The original research behind this review has been supported by the King Abdullah University of Science and 
Technology (KAUST) Award No. KUK-II-007-43, the EPSRC grants Nr. EP/J009539/I “Sparse & Higher-order 
Image Restoration”, and Nr. EP/M00483X/I “Efficient computational tools for inverse imaging problems”, the 
Escuela Politecnica Nacional de Quito under award PIS 12-14 and the MATHAmSud project SOCDE “Sparse 


20 




Optimal Control of Differential Equations”. C. Cao and T. Valkonen have also been supported by Prometeo 
scholarships of SENESCYT (Ecuadorian Ministry of Higher Education, Science, Technology and Innovation). 

A data statement for the EPSRC The data leading to this review publication will be made available, as 
appropriate, as part of the original publications that this work summarises. 


References 

[11 W. Allard, Total variation regularization for image denoising, I. Geometric theory. SIAM J. Math. Anal. 
39 (2008) 1150-1190. 

[2] L. Ambrosio, A. Coscia and G. Dal Maso, Eine Properties of Eunctions with Bounded Deformation. Arch. 
Ration. Mech. Anal. 139 (1997) 201-238. 

[3] L. Ambrosio, N. Eusco and D. Pallara, Functions of Bounded Variation and Free Discontinuity Problems^ 
Oxford University Press (2000). 

[4] E. Bans, M. Nikolova and G. Steidl, Eully smoothed Ll-TV models: Bounds for the minimizers and 
parameter choice. J. Math. Imaging Vision 48 (2014) 295-307. 

[5] H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 
CMS Books in Mathematics, Springer (2011). 

[6] M. Benning, C. Brune, M. Burger and J. Muller, Higher-Order TV Methods—Enhancement via Bregman 
Iteration. J. Sci. Comput. 54 (2013) 269-310. 

[7] M. Benning and M. Burger, Ground states and singular vectors of convex variational regularization meth¬ 
ods. Methods and Applications of Analysis 20 (2013) 295-334, arXiv:1211.2057, 

[8] L. Biegler, G. Biros, O. Ghattas, M. Heinkenschloss, D. Keyes, B. Mallick, L. Tenorio, B. van Bloe- 
men Waanders, K. Willcox and Y. Marzouk, Large-scale inverse problems and quantification of uncertainty, 
volume 712, John Wiley & Sons (2011). 

[9] K. Bredies and M. Holler, A total variation-based JPEG decompression model. SIAM J. Imaging Sci. 5 
(2012) 366-393. 

[10] K. Bredies, K. Kunisch and T. Pock, Total Generalized Variation. SIAM J. Imaging Sci. 3 (2011) 492-526. 

[11] K. Bredies, K. Kunisch and T. Valkonen, Properties of L^-TGV^: The one-dimensional case. J. Math. 
Anal Appl. 398 (2013) 438-454. 

[12] K. Bredies and T. Valkonen, Inverse problems with second-order total generalized variation constraints, 
in: Proc. SampTA 2011 (2011). 

[13] T. Bui-Thanh, K. Willcox and O. Ghattas, Model reduction for large-scale systems with high-dimensional 
parametric input space. SIAM J. Sci. Comput. 30 (2008) 3270-3288. 

[14] M. Burger, J. Muller, E. Papoutsellis and C.-B. Schonlieb, Total Variation Regularisation in Measurement 
and Image space for PET reconstruction. Inverse Problems 10 (2014). 

[15] M. Burger, K. Papafitsoros, E. Papoutsellis and C.-B. Schonlieb, Infimal convolution regularisation func¬ 
tionals on BV and spaces. Part I: The finite p case (2015), submitted. 

[16] R. H. Byrd, G. M. Chin, J. Nocedal and Y. Wu, Sample size selection in optimization methods for machine 
learning. Math. Program. 134 (2012) 127-155. 

[17] J.-E. Cai, R. H. Chan and M. Nikolova, Two-phase approach for deblurring images corrupted by impulse 
plus gaussian noise. Inverse Probl. Imaging 2 (2008) 187-204. 

[18] L. Calatroni, J. C. De los Reyes and C.-B. Schonlieb, Dynamic sampling schemes for optimal noise learning 
under multiple nonsmooth constraints, in: System Modeling and Optimization, 85-95, Springer Verlag 
(2014). 

[19] L. Calatroni, J. C. D. los Reyes and C.-B. Schonlieb, Learning the optimal Total Variation denoising model 
for multiple noise distributions, in preparation. 

[20] V. Caselles, A. Chambolle and M. Novaga, The discontinuity set of solutions of the TV denoising problem 
and some extensions. Multiscale Model. Simul. 6 (2007) 879-894. 


21 


[21] A. Chambolle and P.-L. Lions, Image recovery via total variation minimization and related problems. 
Numer. Math. 76 (1997) 167-188. 

[22] Y. Chen, T. Pock and H. Bischof, Learning ^i-based analysis and synthesis sparsity priors using bi-level 
optimization, in: Workshop on Analysis Operator Learning vs. Dietionary Learning, NIPS 2012 (2012). 

[23] Y. Chen, R. Ranftl and T. Pock, Insights into analysis operator learning: From patch-based sparse models 
to higher-order MRFs. Image Proeessing, IEEE Transaetions on (2014), to appear. 

[24] Y. Chen, W. Yu and T. Pock, On learning optimized reaction diffusion processes for effective image 
restoration, in: IEEE Conferenee on Computer Vision and Pattern Reeognition (2015), to appear. 

[25] C. V. Chung and J. C. De los Reyes, Learning optimal spatially-dependent regularization parameters in 
total variation image restoration, in preparation. 

[26] J. Chung, M. Chung and D. P. O’Leary, Designing optimal spectral filters for inverse problems. SIAM J. 
Sei. Comput. 33 (2011) 3132-3152. 

[27] J. Chung, M. 1. Espahol and T. Nguyen, Optimal Regularization Parameters for General-Form Tikhonov 
Regularization. arXiv preprint arXiv: 1407.1911 (2014). 

[28] A. Cichocki, S.-i. Amari et ah. Adaptive Blind Signal and Image Proeessing, John Wiley Chichester (2002). 

[29] R. Costantini and S. Susstrunk, Virtual sensor design, in: Eleetronie Imaging 2004^ 408-419, International 
Society for Optics and Photonics (2004). 

[30] J. C. De los Reyes, Numerieal PDE-Constrained Optimization, Springer (2015). 

[31] J. C. De los Reyes and C.-B. Schonlieb, Image denoising: Learning the noise model via Nonsmooth PDF- 
constrained optimization. Inverse Probl. Imaging 7 (2013). 

[32] J. C. De los Reyes, C.-B. Schonlieb and T. Valkonen, Optimal parameter learning for higher-order total 
variation regularisation models, in preparation. 

[33] J. C. De los Reyes, C.-B. Schonlieb and T. Valkonen, The structure of optimal parameters for image 
restoration problems, in preparation. 

[34] J. Domke, Generic methods for optimization-based modeling, in: International Conferenee on Artifieial 
Intelligenee and Statisties, 318-326 (2012). 

[35] Y. Dong, M. Hintermiiller and M. M. Rincon-Gamacho, Automated regularization parameter selection in 
multi-scale total variation models for image restoration. J. Math. Imaging Vision 40 (2011) 82-104. 

[36] H. W. Engl, M. Hanke and A. Neubauer, Regularization of Inverse Problems, volume 375, Springer (1996). 

[37] S. N. Evans and P. B. Stark, Inverse problems as statistics. Inverse Problems 18 (2002) R55. 

[38] A. Foi, Glipped noisy images: Heteroskedastic modeling and practical denoising. Signal Proeessing 89 
(2009) 2609 - 2629, special Section: Visual Information Analysis for Security. 

[39] K. Frick, P. Marnitz, A. Munk et ah. Statistical multiresolution Dantzig estimation in imaging: Funda¬ 
mental concepts and algorithmic framework. Eleetronie Journal of Statisties 6 (2012) 231-268. 

[40] G. Gilboa, A total variation spectral framework for scale and texture analysis. SIAM J. Imaqinq Sei. 7 
(2014) 1937-1961. 

[41] E. Haber, L. Horesh and L. Tenorio, Numerical methods for the design of large-scale nonlinear discrete 
ill-posed inverse problems. Inverse Problems 26 (2010) 025002. 

[42] E. Haber and L. Tenorio, Learning regularization functionals - a supervised training approach. Inverse 
Problems 19 (2003) 611. 

[43] M. Hintermiiller and A. Langer, Subspace Gorrection Methods for a Glass of Nonsmooth and Nonadditive 
Gonvex Variational Problems with Mixed L^ /1? Data-Fidelity in Image Processing. SIAM J. Imaqinq Sei. 
6 (2013) 2134-2173. 

[44] M. Hintermiiller and G. Stadler, An Infeasible Primal-Dual Algorithm for Total Bounded Variation-Based 
Inf-Gonvolution-Type Image Restoration. SIAM J. Sei. Comput. 28 (2006) 1-23. 


22 



[45] M. Hintermuller and T. Wu, Bilevel Optimization for Calibrating Point Spread Functions in Blind Decon¬ 
volution (2014), preprint. 

[46] H. Huang, E. Haber, L. Horesh and J. K. Seo, Optimal Estimation Of L 1-regularization Prior Erom A 
Regularized Empirical Bayesian Risk Standpoint. Inverse Probl. Imaging 6 (2012). 

[47] J. Idier, Bayesian approaeh to inverse problems^ John Wiley & Sons (2013). 

[48] A. Jezierska, E. Chouzenoux, J.-C. Pesquet and H. Talbot, A Convex Approach for Image Restoration 
with Exact Poisson-Gaussian Likelihood, Technical report (2013). 

[49] J. Kaipio and E. Somersalo, Statistical and computational inverse problems^ volume 160, Springer Science 
& Business Media (2006). 

[50] N. Kingsbury, Complex wavelets for shift invariant analysis and filtering of signals. Applied and Computa¬ 
tional Harmonic Analysis 10 (2001) 234-253. 

[51] T. Klatzer and T. Pock, Continuous Hyper-parameter Learning for Support Vector Machines, in: Computer 
Vision Winter Workshop (CVWW) (2015). 

[52] E. Knoll, K. Bredies, T. Pock and R. Stollberger, Second order total generalized variation (TGV) for MRI. 
Magnetic Resonance in Medicine 65 (2011) 480-491. 

[53] V. Kolehmainen, T. Tarvainen, S. R. Arridge and J. P. Kaipio, Marginalization of uninteresting distributed 
parameters in inverse problems—application to diffuse optical tomography. International Journal for Un¬ 
certainty Quantification 1 (2011). 

[54] K. Kunisch and T. Pock, A bilevel optimization approach for parameter learning in variational models. 
SIAM J. Imaging Sci. 6 (2013) 938-983. 

[55] J. Lehmann, D. Lorenz, C.-B. Schonlieb and T. Valkonen, Imaging with Kantorovich-Rubinstein discrep¬ 
ancy. SIAM J. Imaging Sci. 7 (2014) 2833-2859, arXiv:1407.0221, 

[56] J. Mairal, E. Bach, J. Ponce and G. Sapiro, Online dictionary learning for sparse coding, in: Proceedings 
of the 26th Annual International Conference on Machine Learning^ 689-696, ACM (2009). 

[57] J. Mairal, B. E, J. Ponce, G. Sapiro and A. Zisserman, Discriminative learned dictionaries for local image 
analysis. CVPR (2008). 

[58] Y. Meyer, Oscillating patterns in image processing and nonlinear evolution equations^ AMS (2001). 

[59] D. Mumford and B. Gidas, Stochastic models for generic images. Quarterly of Applied Mathematics 59 
(2001) 85-112. 

[60] E. Natterer and E. Wiibbeling, Mathematical Methods in Image Reconstruction^ Monographs on Mathe¬ 
matical Modeling and Gomputation Vol 5, Philadelphia, PA: SIAM) (2001). 

[61] M. Nikolova, A variational approach to remove outliers and impulse noise. J. Math. Imaging Vision 20 
(2004) 99-120. 

[62] J. Nocedal and S. Wright, Numerical Optimization^ Springer Series in Operations Research and Einancial 
Engineering, Springer (2006). 

[63] P. Ochs, R. Ranftl, T. Brox and T. Pock, Bilevel Optimization with Nonsmooth Lower Level Problems, in: 
International Conference on Scale Space and Variational Methods in Computer Vision (SSVM) (2015), to 
appear. 

[64] B. Olshausen and D. Eield, Emergence of simple-cell receptive field properties by learning a sparse code 
for natural images. Nature 381 (1996) 607-609. 

[65] K. Papafitsoros and K. Bredies, A study of the one dimensional total generalised variation regularisation 
problem. arXiv preprint arXiv:1309.5900 (2013). 

[66] G. Peyre and J. M. Eadili, Learning analysis sparsity priors. SamptaJl (2011). 

[67] R. Ranftl and T. Pock, A Deep Variational Model for Image Segmentation, in: 36th German Conference 
on Pattern Recognition (GCPR) (2014). 


23 


[68] W. Ring, Structural Properties of Solutions to Total Variation Regularization Problems. ESAIM: Math. 
Model. Numer. Anal. 34 (2000) 799-810. 

[69] R. T. Rockafellar and R. J.-B. Wets, Variational Analysis^ Springer (1998). 

[70] S. Roth and M. J. Black, Fields of experts: A framework for learning image priors, in: Computer Vision 
and Pattern Reeognition, 2005. CVPR 2005. IEEE Computer Soeiety Conference on^ volume 2, 860-867, 
IEEE (2005). 

[71] L. I. Rudin, S. Osher and E. Eatemi, Nonlinear total variation based noise removal algorithms. Physica D: 
Nonlinear Phenomena 60 (1992) 259-268. 

[72] A. Sawatzky, C. Brune, J. Muller and M. Burger, Total Variation Processing of Images with Poisson 
Statistics, in: Computer Analysis of Images and Patterns^ Lecture Notes in Computer Science^ volume 
5702, Edited by X. Jiang and N. Petkov, 533-540, Springer Berlin Heidelberg (2009). 

[73] L. L. Scharf, Statistical Signal Processing^ volume 98, Addison-Wesley Reading, MA (1991). 

[74] U. Schmidt and S. Roth, Shrinkage fields for effective image restoration, in: Computer Vision and Pattern 
Recognition (CVPR), 2014 IEEE Conference on, 2774-2781, IEEE (2014). 

[75] J.-L. Starck, E. D. Murtagh and A. Bijaoui, Image restoration with noise suppression using a wavelet 
transform and a multiresolution support constraint (1994). 

[76] E. Tadmor, S. Nezzar and L. Vese, A multiscale image representation using hierarchical (BV, L 2) decom¬ 
positions. Multiscale Model. Simul. 2 (2004) 554-579. 

[77] M. E. Tappen, Utilizing variational optimization to learn Markov random fields, in: Computer Vision and 
Pattern Recognition, 2007. CVPR^OJ. IEEE Conference on, 1-8, IEEE (2007). 

[78] R. Temam, Mathematical problems in plasticity, Gauthier-Villars (1985). 

[79] M. Unser, Texture classification and segmentation using wavelet frames. Image Processing, IEEE Trans¬ 
actions on 4 (1995) 1549-1560. 

[80] M. Unser and N. Chenouard, A unifying parametric framework for 2D steerable wavelet transforms. SIAM 
J. Imaging Sci. 6 (2013) 102-135. 

[81] T. Valkonen, The jump set under geometric regularisation. Part 1: Basic technique and first-order denois- 
ing. SIAM J. Math. Anal. (2015), accepted, arXiv: 1407.1531, 

[82] Y. Vardi, L. Shepp and L. Kaufman, A statistical model for positron emission tomography. Journal of the 
American Statistical Association 80 (1985) 8-20. 

[83] E. Viola, A. Eitzgibbon and R. Cipolla, A unifying resolution-independent formulation for early vision, in: 
Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, 494-501, IEEE (2012). 

[84] Z. Wang, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, Image quality assessment: Erom error visibility 
to structural similarity. IEEE Trans. Image Processing 13 (2004) 600-612. 

[85] G. Yu, G. Sapiro and S. Mallat, Image modeling and enhancement via structured sparse model selection. 
Proc. IEEE Int. Conf. Image Processing (2010). 


24 


